{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "_dEaVsqSgNyQ"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "4FyfuZX-gTKS"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "sT8AyHRMNh41"
      },
      "source": [
        "# TensorFlow Recommenders: Quickstart\n",
        "\n",
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/recommenders/quickstart\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/quickstart.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/recommenders/blob/main/docs/examples/quickstart.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/quickstart.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8f-reQ11gbLB"
      },
      "source": [
        "In this tutorial, we build a simple matrix factorization model using the [MovieLens 100K dataset](https://grouplens.org/datasets/movielens/100k/) with TFRS. We can use this model to recommend movies for a given user."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "qA00wBE2Ntdm"
      },
      "source": [
        "### Import TFRS\n",
        "\n",
        "First, install and import TFRS:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "6yzAaM85Z12D"
      },
      "outputs": [],
      "source": [
        "!pip install -q tensorflow-recommenders\n",
        "!pip install -q --upgrade tensorflow-datasets"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "n3oYt3R6Nr9l"
      },
      "outputs": [],
      "source": [
        "from typing import Dict, Text\n",
        "\n",
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "\n",
        "import tensorflow_datasets as tfds\n",
        "import tensorflow_recommenders as tfrs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "zCxQ1CZcO2wh"
      },
      "source": [
        "### Read the data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "M-mxBYjdO5m7"
      },
      "outputs": [],
      "source": [
        "# Ratings data.\n",
        "ratings = tfds.load('movielens/100k-ratings', split=\"train\")\n",
        "# Features of all the available movies.\n",
        "movies = tfds.load('movielens/100k-movies', split=\"train\")\n",
        "\n",
        "# Select the basic features.\n",
        "ratings = ratings.map(lambda x: {\n",
        "    \"movie_title\": x[\"movie_title\"],\n",
        "    \"user_id\": x[\"user_id\"]\n",
        "})\n",
        "movies = movies.map(lambda x: x[\"movie_title\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "5W0HSfmSNCWm"
      },
      "source": [
        "Build vocabularies to convert user ids and movie titles into integer indices for embedding layers:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "9I1VTEjHzpfX"
      },
      "outputs": [],
      "source": [
        "user_ids_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)\n",
        "user_ids_vocabulary.adapt(ratings.map(lambda x: x[\"user_id\"]))\n",
        "\n",
        "movie_titles_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)\n",
        "movie_titles_vocabulary.adapt(movies)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Lrch6rVBOB9Q"
      },
      "source": [
        "### Define a model\n",
        "\n",
        "We can define a TFRS model by inheriting from `tfrs.Model` and implementing the `compute_loss` method:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "e5dNbDZwOIHR"
      },
      "outputs": [],
      "source": [
        "class MovieLensModel(tfrs.Model):\n",
        "  # We derive from a custom base class to help reduce boilerplate. Under the hood,\n",
        "  # these are still plain Keras Models.\n",
        "\n",
        "  def __init__(\n",
        "      self,\n",
        "      user_model: tf.keras.Model,\n",
        "      movie_model: tf.keras.Model,\n",
        "      task: tfrs.tasks.Retrieval):\n",
        "    super().__init__()\n",
        "\n",
        "    # Set up user and movie representations.\n",
        "    self.user_model = user_model\n",
        "    self.movie_model = movie_model\n",
        "\n",
        "    # Set up a retrieval task.\n",
        "    self.task = task\n",
        "\n",
        "  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -\u003e tf.Tensor:\n",
        "    # Define how the loss is computed.\n",
        "\n",
        "    user_embeddings = self.user_model(features[\"user_id\"])\n",
        "    movie_embeddings = self.movie_model(features[\"movie_title\"])\n",
        "\n",
        "    return self.task(user_embeddings, movie_embeddings)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "wdwtgUCEOI8y"
      },
      "source": [
        "Define the two models and the retrieval task."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "EvtnUN6aUY4U"
      },
      "outputs": [],
      "source": [
        "# Define user and movie models.\n",
        "user_model = tf.keras.Sequential([\n",
        "    user_ids_vocabulary,\n",
        "    tf.keras.layers.Embedding(user_ids_vocabulary.vocab_size(), 64)\n",
        "])\n",
        "movie_model = tf.keras.Sequential([\n",
        "    movie_titles_vocabulary,\n",
        "    tf.keras.layers.Embedding(movie_titles_vocabulary.vocab_size(), 64)\n",
        "])\n",
        "\n",
        "# Define your objectives.\n",
        "task = tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(\n",
        "    movies.batch(128).map(movie_model)\n",
        "  )\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "BMV0HpzmJGWk"
      },
      "source": [
        "\n",
        "### Fit and evaluate it.\n",
        "\n",
        "Create the model, train it, and generate predictions:\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "H2tQDhqkOKf1"
      },
      "outputs": [],
      "source": [
        "# Create a retrieval model.\n",
        "model = MovieLensModel(user_model, movie_model, task)\n",
        "model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))\n",
        "\n",
        "# Train for 3 epochs.\n",
        "model.fit(ratings.batch(4096), epochs=3)\n",
        "\n",
        "# Use brute-force search to set up retrieval using the trained representations.\n",
        "index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)\n",
        "index.index(movies.batch(100).map(model.movie_model), movies)\n",
        "\n",
        "# Get some recommendations.\n",
        "_, titles = index(np.array([\"42\"]))\n",
        "print(f\"Top 3 recommendations for user 42: {titles[0, :3]}\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "neJAJVwbReNd"
      },
      "outputs": [],
      "source": [
        ""
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "quickstart.ipynb",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
